New Statistical and Syntactic Models for Machine Translation
نویسندگان
چکیده
Significant improvements have been achieved in machine translation (MT) over the past few years, mostly motivated by the appearance of statistical machine translation (SMT) technology, which is currently considered the best way to perform MT of natural languages. The main goal of this thesis is to enhance the classical SMT models, introducing syntactical knowledge in the pre-translation step by reordering the source side of the corpus. To a great extent, our interest is in the value of syntax in reordering for languages with high word order disparity. A secondary objective consists of determining the potential of different language model (LM) enhancement techniques in order to improve the performance and efficiency of SMT systems. We start with a comprehensive study of the SMT state-of-the-art, describing the fundamental models underlying the translation process, along with a brief description of the main methods of automatic evaluation of translation quality. We emphasize phrase-based and N -gram-based SMT, analyzing the major differences between these two approaches. Subsequently, we concentrate on language modeling methods that have not received much attention in the SMT community. We report on experiments in applying N -grambased SMT system adaptation to a speech transcription task, describe a positive impact of accurate cut-off threshold selection both on the model size and LM noisiness, and finally present a continuous-space LM, estimated in the form of an artificial neural network. Moreover, we propose a novel syntax-based approach to handle the fundamental problem of word ordering for SMT exploiting syntactic representations of source and target texts. The idea of augmenting SMT by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality, especially for translation between languages with high word order disparity. We provide the reader with a thorough study of the state-of-the-art reordering techniques and introduce a new classification of reordering algorithms for SMT. We then propose a
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملExample-based Machine Translation Based on Syntactic Transfer with Statistical Models
This paper presents example-based machine translation (MT) based on syntactic transfer, which selects the best translation by using models of statistical machine translation. Example-based MT sometimes generates invalid translations because it selects similar examples to the input sentence based only on source language similarity. The method proposed in this paper selects the best translation b...
متن کاملBilingual Structured Language Models for Statistical Machine Translation
This paper describes a novel target-side syntactic language model for phrase-based statistical machine translation, bilingual structured language model. Our approach represents a new way to adapt structured language models (Chelba and Jelinek, 2000) to statistical machine translation, and a first attempt to adapt them to phrasebased statistical machine translation. We propose a number of variat...
متن کاملCCG Supertags in Factored Statistical Machine Translation
Combinatorial Categorial Grammar (CCG) supertags present phrase-based machine translation with an opportunity to access rich syntactic information at a word level. The challenge is incorporating this information into the translation process. Factored translation models allow the inclusion of supertags as a factor in the source or target language. We show that this results in an improvement in t...
متن کاملCombining Models for the Alignment of Parallel Syntactic Trees
The alignment of syntactic trees is the task of aligning the internal and leaf nodes of two sentences in different languages structured as trees. The output of the alignment can be used, for instance, as knowledge resource for learning translation rules (for rule-based machine translation systems) or models (for statistical machine translation systems). This paper presents some experiments carr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009